## datatable function from DT package create an HTML widget display of the dataset
## install DT package if the package is not yet available in your R environment
readxl::read_excel("dataset/dataset-variable-description.xlsx") |>
DT::datatable()HR Analytics Employee Attrition And Performance
BCon 147: special topics
1 Project overiew
In this project, we will explore employee attrition and performance using the HR Analytics Employee Attrition & Performance dataset. The primary goal is to develop insights into the factors that contribute to employee attrition. By analyzing a range of factors, including demographic data, job satisfaction, work-life balance, and job role, we aim to help businesses identify key areas where they can improve employee retention.
2 Scenario
Imagine you are working as a data analyst for a mid-sized company that is experiencing high employee turnover, especially among high-performing employees. The company has been facing increased costs related to hiring and training new employees, and management is concerned about the negative impact on productivity and morale. The human resources (HR) team has collected historical employee data and now looks to you for actionable insights. They want to understand why employees are leaving and how to retain talent effectively.
Your task is to analyze the dataset and provide insights that will help HR prioritize retention strategies. These strategies could include interventions like revising compensation policies, improving job satisfaction, or focusing on work-life balance initiatives. The success of your analysis could lead to significant cost savings for the company and an increase in employee engagement and performance.
3 Understanding data source
The dataset used for this project provides information about employee demographics, performance metrics, and various satisfaction ratings. The dataset is particularly useful for exploring how factors such as job satisfaction, work-life balance, and training opportunities influence employee performance and attrition.
This dataset is well-suited for conducting in-depth analysis of employee performance and retention, enabling us to build predictive models that identify the key drivers of employee attrition. Additionally, we can assess the impact of various organizational factors, such as training and work-life balance, on both performance and retention outcomes.
4 Data wrangling and management
Libraries
Before we start working on the dataset, we need to load the necessary libraries that will be used for data wrangling, analysis and visualization. Make sure to load the following libraries here. For packages to be installed, you can use the install.packages function. There are packages to be installed later on this project, so make sure to install them as needed and load them here.
# load all your libraries here
options(repos = c(CRAN = "https://cran.rstudio.com"))
if (!require("ggplot2")) install.packages("ggplot2")
if (!require("sjPlot")) install.packages("sjPlot")
if (!require("ggstatsplot")) install.packages("ggstatsplot")
if (!require("dplyr")) install.packages("dplyr")
if (!require("tidyr")) install.packages("tidyr")
if (!require("GGally")) install.packages("GGally")
if (!require("report")) install.packages("report")
library(dplyr)
library(ggplot2)
library(DT)
library(readxl)
library(janitor)
library (GGally)
library(sjPlot)
library(report)
library(ggstatsplot)
library(tidyr)4.1 Data importation
Import the two dataset
Employee.csvandPerformanceRating.csv. Save theEmployee.csvasemployee_dtaandPerformanceRating.csvasperf_rating_dta.Merge the two dataset using the
left_joinfunction fromdplyr. Use theEmployeeIDvariable as the varible to join by. You may read more information about theleft_joinfunction here.Save the merged dataset as
hr_perf_dtaand display the dataset using thedatatablefunction fromDTpackage.
## import the two data here
library(readr)
Employee <- read_csv("dataset/Employee.csv")
View(Employee)
PerformanceRating <- read_csv("dataset/PerformanceRating.csv")
View(PerformanceRating)
employee_dta <- read_csv("Employee.csv")
perf_rating_dta <- read_csv("PerformanceRating.csv")
## merge employee_dta and perf_rating_dta using left_join function.
## save the merged dataset as hr_perf_dta
hr_perf_dta <- left_join(employee_dta, perf_rating_dta, by = "EmployeeID")
print(hr_perf_dta)# A tibble: 6,899 × 33
EmployeeID FirstName LastName Gender Age BusinessTravel Department
<chr> <chr> <chr> <chr> <dbl> <chr> <chr>
1 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
2 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
3 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
4 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
5 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
6 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
7 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
8 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
9 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
10 CBCB-9C9D Leonerd Aland Male 38 Some Travel Sales
# ℹ 6,889 more rows
# ℹ 26 more variables: `DistanceFromHome (KM)` <dbl>, State <chr>,
# Ethnicity <chr>, Education <dbl>, EducationField <chr>, JobRole <chr>,
# MaritalStatus <chr>, Salary <dbl>, StockOptionLevel <dbl>, OverTime <chr>,
# HireDate <chr>, Attrition <chr>, YearsAtCompany <dbl>,
# YearsInMostRecentRole <dbl>, YearsSinceLastPromotion <dbl>,
# YearsWithCurrManager <dbl>, PerformanceID <chr>, ReviewDate <chr>, …
## Use the datatable from DT package to display the merged dataset
datatable(hr_perf_dta)4.2 Data management
Using the
clean_namesfunction fromjanitorpackage, standardize the variable names by using the recommended naming of variables.Save the renamed variables as
hr_perf_dtato update the dataset.
## clean names using the janitor packages and save as hr_perf_dta
hr_perf_dta <- clean_names(hr_perf_dta)
## display the renamed hr_perf_dta using datatable function
head(hr_perf_dta)# A tibble: 6 × 33
employee_id first_name last_name gender age business_travel department
<chr> <chr> <chr> <chr> <dbl> <chr> <chr>
1 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
2 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
3 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
4 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
5 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
6 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
# ℹ 26 more variables: distance_from_home_km <dbl>, state <chr>,
# ethnicity <chr>, education <dbl>, education_field <chr>, job_role <chr>,
# marital_status <chr>, salary <dbl>, stock_option_level <dbl>,
# over_time <chr>, hire_date <chr>, attrition <chr>, years_at_company <dbl>,
# years_in_most_recent_role <dbl>, years_since_last_promotion <dbl>,
# years_with_curr_manager <dbl>, performance_id <chr>, review_date <chr>,
# environment_satisfaction <dbl>, job_satisfaction <dbl>, …
Create a new variable
cat_educationwhereineducationis1=No formal education;2=High school;3=Bachelor;4=Masters;5=Doctorate. Use thecase_whenfunction to accomplish this task.Similarly, create new variables
cat_envi_sat,cat_job_sat, andcat_relation_satforenvironment_satisfaction,job_satisfaction, andrelationship_satisfaction, respectively. Re-code the values accordingly as1=Very dissatisfied;2=Dissatisfied;3=Neutral;4=Satisfied; and5=Very satisfied.Create new variables
cat_work_life_balance,cat_self_rating,cat_manager_ratingforwork_life_balance,self_rating, andmanager_rating, respectively. Re-code accordingly as1=Unacceptable;2=Needs improvement;3=Meets expectation;4=Exceeds expectation; and5=Above and beyond.Create a new variable
bi_attritionby transformingattritionvariable as a numeric variabe. Re-code accordingly asNo=0, andYes=1.Save all the changes in the
hr_perf_dta. Note that saving the changes with the same name will update the dataset with the new variables created.
## create cat_education
hr_perf_dta <- hr_perf_dta %>%
mutate(
cat_education = case_when(
education == "No formal education" ~ 1,
education == "High school" ~ 2,
education == "Bachelor" ~ 3,
education == "Masters" ~ 4,
education == "Doctorate" ~ 5,
TRUE ~ NA_real_
)
)
## create cat_envi_sat, cat_job_sat, and cat_relation_sat
hr_perf_dta <- hr_perf_dta %>%
mutate(
cat_envi_sat = case_when(
environment_satisfaction == "Very dissatisfied" ~ 1,
environment_satisfaction == "Dissatisfied" ~ 2,
environment_satisfaction == "Neutral" ~ 3,
environment_satisfaction == "Satisfied" ~ 4,
environment_satisfaction == "Very satisfied" ~ 5,
TRUE ~ NA_real_ # Assign NA for any missing or unmatched values
),
cat_job_sat = case_when(
job_satisfaction == "Very dissatisfied" ~ 1,
job_satisfaction == "Dissatisfied" ~ 2,
job_satisfaction == "Neutral" ~ 3,
job_satisfaction == "Satisfied" ~ 4,
job_satisfaction == "Very satisfied" ~ 5,
TRUE ~ NA_real_ # Assign NA for any missing or unmatched values
),
cat_relation_sat = case_when(
relationship_satisfaction == "Very dissatisfied" ~ 1,
relationship_satisfaction == "Dissatisfied" ~ 2,
relationship_satisfaction == "Neutral" ~ 3,
relationship_satisfaction == "Satisfied" ~ 4,
relationship_satisfaction == "Very satisfied" ~ 5,
TRUE ~ NA_real_ # Assign NA for any missing or unmatched values
)
)
# Display the first few rows to check if the variables are created correctly
head(hr_perf_dta)# A tibble: 6 × 37
employee_id first_name last_name gender age business_travel department
<chr> <chr> <chr> <chr> <dbl> <chr> <chr>
1 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
2 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
3 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
4 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
5 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
6 3012-1A41 Leonelle Simco Female 30 Some Travel Sales
# ℹ 30 more variables: distance_from_home_km <dbl>, state <chr>,
# ethnicity <chr>, education <dbl>, education_field <chr>, job_role <chr>,
# marital_status <chr>, salary <dbl>, stock_option_level <dbl>,
# over_time <chr>, hire_date <chr>, attrition <chr>, years_at_company <dbl>,
# years_in_most_recent_role <dbl>, years_since_last_promotion <dbl>,
# years_with_curr_manager <dbl>, performance_id <chr>, review_date <chr>,
# environment_satisfaction <dbl>, job_satisfaction <dbl>, …
## create cat_work_life_balance, cat_self_rating, and cat_manager_rating
hr_perf_dta <- hr_perf_dta %>%
mutate(
cat_work_life_balance = case_when(
work_life_balance == "Unacceptable" ~ 1,
work_life_balance == "Needs improvement" ~ 2,
work_life_balance == "Meets expectation" ~ 3,
work_life_balance == "Exceeds expectation" ~ 4,
work_life_balance == "Above and beyond" ~ 5,
TRUE ~ NA_real_ # Assign NA for any missing or unmatched values
),
cat_self_rating = case_when(
self_rating == "Unacceptable" ~ 1,
self_rating == "Needs improvement" ~ 2,
self_rating == "Meets expectation" ~ 3,
self_rating == "Exceeds expectation" ~ 4,
self_rating == "Above and beyond" ~ 5,
TRUE ~ NA_real_ # Assign NA for any missing or unmatched values
),
cat_manager_rating = case_when(
manager_rating == "Unacceptable" ~ 1,
manager_rating == "Needs improvement" ~ 2,
manager_rating == "Meets expectation" ~ 3,
manager_rating == "Exceeds expectation" ~ 4,
manager_rating == "Above and beyond" ~ 5,
TRUE ~ NA_real_ # Assign NA for any missing or unmatched values
)
)
## create bi_attrition
hr_perf_dta <- hr_perf_dta %>%
mutate(
bi_attrition = case_when(
attrition == "No" ~ 0,
attrition == "Yes" ~ 1,
TRUE ~ NA_real_ # Assign NA for any missing or unmatched values
)
)
## print the updated hr_perf_dta using datatable function
datatable(hr_perf_dta)5 Exploratory data analysis
5.1 Descriptive statistics of employee attrition
Select the variables
attrition,job_role,department,age,salary,job_satisfaction, andwork_life_balance.Save asattrition_key_var_dta.Compute and plot the attrition rate across
job_role,department, andage,salary,job_satisfaction, andwork_life_balance. To compute for the attrition rate, group the dataset by job role. Afterward, you can use thecountfunction to get the frequency of attrition for each job role and then divide it by the total number of observations. Save the computation aspct_attrition. Do not forget to ungroup before storing the output. Store the output asattrition_rate_job_role.Plot for the attrition rate across
job_rolehas been done for you! Study each line of code. You have the freedom to customize your plot accordingly. Show your creativity!
## selecting attrition key variables and save as `attrition_key_var_dta`
attrition_key_var_dta <- hr_perf_dta %>%
select(attrition, job_role, department, age, salary, job_satisfaction, work_life_balance, bi_attrition)
## compute the attrition rate across job_role and save as attrition_rate_job_role
attrition_rate_job_role <- attrition_key_var_dta %>%
group_by(job_role) %>%
summarise(
count_attrition = sum(bi_attrition, na.rm = TRUE), # Count of employees who left
total_count = n(), # Total number of employees in that job role
pct_attrition = count_attrition / total_count * 100 # Attrition rate in percentage
) %>%
ungroup() # Ungroup to avoid carrying over the grouping
## print attrition_rate_job_role
print(attrition_rate_job_role)# A tibble: 13 × 4
job_role count_attrition total_count pct_attrition
<chr> <dbl> <int> <dbl>
1 Analytics Manager 28 213 13.1
2 Data Scientist 597 1387 43.0
3 Engineering Manager 18 307 5.86
4 HR Business Partner 0 25 0
5 HR Executive 29 119 24.4
6 HR Manager 0 17 0
7 Machine Learning Engineer 95 582 16.3
8 Manager 19 145 13.1
9 Recruiter 86 152 56.6
10 Sales Executive 543 1567 34.7
11 Sales Representative 317 500 63.4
12 Senior Software Engineer 84 512 16.4
13 Software Engineer 445 1373 32.4
## Plot the attrition rate
ggplot(attrition_rate_job_role, aes(x = reorder(job_role, pct_attrition), y = pct_attrition)) +
geom_col(fill = "violet", color = "black", width = 0.7) + # Bar color with border
geom_text(aes(label = round(pct_attrition, 1)), vjust = -0.5, size = 5, color = "white", fontface = "bold") + # Add labels on top of bars
labs(title = "Attrition Rate by Job Role",
subtitle = "Understanding Employee Turnover Across Roles",
x = "Job Role",
y = "Attrition Rate (%)") +
theme_dark(base_size = 15) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, color = "#333333"), # Rotate and color x-axis labels
axis.title = element_text(face = "bold", color = "#333333"),
plot.title = element_text(face = "bold", size = 20, color = "red"), # Title color
plot.subtitle = element_text(face = "italic", size = 10, color = "#666666"), # Subtitle color
panel.grid.major.y = element_line(color = "#E0E0E0"), # Customize grid lines
panel.grid.minor = element_blank()) + # Remove minor grid lines
coord_cartesian(ylim = c(0, max(attrition_rate_job_role$pct_attrition) + 10)) + # Set y-axis limit
scale_y_continuous(breaks = seq(0, max(attrition_rate_job_role$pct_attrition) + 10, by = 10)) # Y-axis breaks5.2 Identifying attrition key drivers using correlation analysis
Conduct a correlation analysis of key variables:
bi_attrition,salary,years_at_company,job_satisfaction,manager_rating, andwork_life_balance. Use thecor()function to run the correlation analysis. Remove missing values using thena.omit()before running the correlation analysis. Save the output inhr_corr.Use a correlation matrix or heatmap to visualize the relationship between these variables and attrition. You can use the
GGallypackage and use theggcorrfunction to visualize the correlation heatmap. You may explore this site for more information: ggcorr.Discuss which factors seem most correlated with attrition and what that suggests aobut why employees are leaving.
## conduct correlation of key variables.
# Select key variables and remove missing values
key_vars <- hr_perf_dta %>%
select(bi_attrition, salary, years_at_company, job_satisfaction, manager_rating, work_life_balance) %>%
na.omit() # Remove missing values
# Calculate the correlation matrix
hr_corr <- cor(key_vars)
## print hr_corr
print(hr_corr) bi_attrition salary years_at_company job_satisfaction
bi_attrition 1.000000000 -0.211181478 -0.6896527798 0.0132368129
salary -0.211181478 1.000000000 0.2206442116 0.0053054850
years_at_company -0.689652780 0.220644212 1.0000000000 0.0008700583
job_satisfaction 0.013236813 0.005305485 0.0008700583 1.0000000000
manager_rating -0.007654429 -0.001596736 0.0178656879 -0.0158205481
work_life_balance 0.003428836 -0.001517145 0.0079339508 0.0417242942
manager_rating work_life_balance
bi_attrition -0.007654429 0.003428836
salary -0.001596736 -0.001517145
years_at_company 0.017865688 0.007933951
job_satisfaction -0.015820548 0.041724294
manager_rating 1.000000000 0.007996938
work_life_balance 0.007996938 1.000000000
## install GGally package and use ggcorr function to visualize the correlation
#install.packages("GGally")
#load the package
library(GGally)
# Create the correlation heatmap using ggcorr
correlation_plot <- ggcorr(key_vars,
method = c("everything", "pearson"), # Specify the correlation method
label = TRUE, # Show correlation coefficients
label_size = 4, # Size of the correlation labels
label_color = "black", # Color of the labels
high = "orange", # Color for high correlations
low = "purple", # Color for low correlations
mid = "pink", # Color for mid correlations
midpoint = 0, # Midpoint for colors
size = 3, # Size of the points in the plot
layout.exp = 0.5) # Space between the labels
# Add title and customize theme
correlation_plot +
ggtitle("Correlation Heatmap of Key Variables") + # Title of the plot
theme_dark(base_size = 15) + # Minimal theme for aesthetics
theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"), # Center title
axis.text.x = element_text(angle = 30, hjust = 1, size = 10), # Rotate x-axis labels
axis.text.y = element_text(size = 12)) # Adjust y-axis labelsProvide your discussion here.
key Correlations
Work-life balance has a negative correlation with attrition, suggesting that employees with poor work-life balance are more likely to leave.
Manager rating is also negatively correlated with attrition, indicating that employees with low ratings of their manager are more prone to turnover.
Job satisfaction has a moderate negative correlation with attrition, meaning lower satisfaction leads to higher attrition.
Salary has a weaker negative correlation with attrition, implying that employees earning lower salaries might be more likely to leave.
Years at company shows a very weak relationship, suggesting length of service has minimal impact on attrition.
Implications
Employees are more likely to stay if they have a good work-life balance, receive positive feedback from their managers, and feel satisfied with their jobs.
Compensation, while somewhat related to attrition, is less impactful than work-life balance and job satisfaction.
HR should focus on improving work-life balance, managerial relationships, and job satisfaction to reduce attrition.
This analysis suggests that employees are leaving primarily due to dissatisfaction with work-life balance, management, and job satisfaction, while salary and tenure play smaller roles. HR strategies should be developed to address these key factors to improve retention.
5.3 Predictive modeling for attrition
Create a logistic regression model to predict employee attrition using the following variables:
salary,years_at_company,job_satisfaction,manager_rating, andwork_life_balance. Save the model ashr_attrition_glm_model. Print the summary of the model using thesummaryfunction.Install the
sjPlotpackage and use thetab_modelfunction to display the summary of the model. You may read the documentation here on how to customize your model summary.Also, use the
plot_modelfunction to visualize the model coefficients. You may read the documentation here on how to customize your model visualization.Discuss the results of the logistic regression model and what they suggest about the factors that contribute to employee attrition.
## run a logistic regression model to predict employee attrition
## save the model as hr_attrition_glm_model
hr_attrition_glm_model <- glm(bi_attrition ~ salary + years_at_company +job_satisfaction + manager_rating + work_life_balance,data = hr_perf_dta,family = "binomial")
## print the summary of the model using the summary function
summary(hr_attrition_glm_model)
Call:
glm(formula = bi_attrition ~ salary + years_at_company + job_satisfaction +
manager_rating + work_life_balance, family = "binomial",
data = hr_perf_dta)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.571e+00 2.173e-01 11.831 <2e-16 ***
salary -3.633e-06 4.086e-07 -8.893 <2e-16 ***
years_at_company -6.333e-01 1.476e-02 -42.919 <2e-16 ***
job_satisfaction 3.470e-02 3.186e-02 1.089 0.276
manager_rating 5.071e-03 3.810e-02 0.133 0.894
work_life_balance 2.587e-02 3.198e-02 0.809 0.419
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8574.5 on 6708 degrees of freedom
Residual deviance: 4781.6 on 6703 degrees of freedom
(190 observations deleted due to missingness)
AIC: 4793.6
Number of Fisher Scoring iterations: 5
## install sjPlot package and use tab_model function to display the summary of the model
#install.packages("sjplot")
# Use tab_model to display the summary of the logistic regression model
tab_model(hr_attrition_glm_model,
show.p = TRUE, # Show p-values
show.se = TRUE, # Show standard errors
dv.labels = "Attrition Prediction Model", # Label for dependent variable
string.pred = "Predictors", # Custom label for predictors
string.ci = "Confidence Interval", # Custom label for confidence intervals
title = "Summary of Logistic Regression for Employee Attrition")| Attrition Prediction Model | ||||
| Predictors | Odds Ratios | std. Error | Confidence Interval | p |
| (Intercept) | 13.08 | 2.84 | 8.56 – 20.07 | <0.001 |
| salary | 1.00 | 0.00 | 1.00 – 1.00 | <0.001 |
| years at company | 0.53 | 0.01 | 0.52 – 0.55 | <0.001 |
| job satisfaction | 1.04 | 0.03 | 0.97 – 1.10 | 0.276 |
| manager rating | 1.01 | 0.04 | 0.93 – 1.08 | 0.894 |
| work life balance | 1.03 | 0.03 | 0.96 – 1.09 | 0.419 |
| Observations | 6709 | |||
| R2 Tjur | 0.502 | |||
## use plot_model function to visualize the model coefficients
plot_model(hr_attrition_glm_model,
type = "est", # Coefficient estimates (log-odds)
show.values = TRUE, # Show the actual coefficient values
show.p = TRUE, # Display p-values
value.offset = 0.3, # Move the value labels slightly away from the points for clarity
title = "Logistic Regression Coefficients for Attrition Prediction",
axis.title = c("Predictors", "Log-Odds Estimates"),
line.size = 1.2, # Make the lines slightly thicker
colors = "Set2") +
theme_minimal() + # Apply a minimalist theme for clean appearance
theme(
plot.title = element_text(hjust = 0.5, size = 18, face = "bold", color = "blue"), # Center and style title
axis.text.x = element_text(size = 12, face = "bold", color = "black"), # Customize x-axis label appearance
axis.text.y = element_text(size = 12, face = "bold", color = "black"), # Customize y-axis label appearance
axis.title.x = element_text(size = 14, face = "italic", color = "maroon"), # Customize x-axis title
axis.title.y = element_text(size = 14, face = "italic", color = "maroon"), # Customize y-axis title
panel.grid.major = element_line(color = "gray90")) # Light gridlines for a cleaner lookProvide your discussion here.
answer:
5.4 Discussion and summary of Predictive Modeling for Attrition:
Logistic Regression Model Summary: The logistic regression model was created to predict employee attrition using key predictors like salary, years at company, job satisfaction, manager rating, and work-life balance. The output of the model reveals which of these factors significantly impact the likelihood of an employee leaving the company.
Key Findings:
Work-life balance and job satisfaction appear to have the strongest negative relationship with attrition, meaning employees with poor work-life balance and low job satisfaction are more likely to leave.
Manager rating also plays a significant role, with higher ratings associated with lower attrition.
Salary and years at company show less significant impact compared to work-life balance, manager rating, and job satisfaction.
Coefficient Visualization: The plot of coefficients (log-odds) clearly shows the direction and magnitude of the relationship between each predictor and attrition. Variables like job satisfaction and manager rating significantly reduce the likelihood of attrition when they are higher, while poor work-life balance increases the chances of employees leaving.
5.5 Recommendations :
Enhance Job Satisfaction: Implement strategies that boost employee satisfaction, such as offering professional development opportunities, recognizing achievements, and ensuring role alignment.
Improve Work-Life Balance: Introduce flexible work arrangements, improve workload management, and promote a healthy work-life culture.
Manager Training and Support: Invest in leadership development programs to ensure managers are equipped to support, motivate, and retain their team members.
These strategies, based on the predictive model insights, could significantly help in reducing attrition and improving employee retention.
5.6 Analysis of compensation and turnover
Compare the average monthly income of employees who left the company (
bi_attrition = 1) and those who stayed (bi_attrition = 0). Use thet.testfunction to conduct a t-test and determine if there is a significant difference in average monthly income between the two groups. Save the results in a variable calledattrition_ttest_results.Install the
reportpackage and use thereportfunction to generate a report of the t-test results.Install the
ggstatsplotpackage and use theggbetweenstatsfunction to visualize the distribution of monthly income for employees who left and those who stayed. Make sure to map thebi_attritionvariable to thexargument and thesalaryvariable to theyargument.Visualize the
salaryvariable for employees who left and those who stayed usinggeom_histogramwithgeom_freqpoly. Make sure to facet the plot by thebi_attritionvariable and applyalphaon the histogram plot.Provide recommendations on whether revising compensation policies could be an effective retention strategy.
## compare the average monthly income of employees who left and those who stayed
attrition_ttest_results <- t.test(salary ~ bi_attrition, data = hr_perf_dta)
# Print the t-test results
print(attrition_ttest_results)
Welch Two Sample t-test
data: salary by bi_attrition
t = 18.869, df = 5524.2, p-value < 2.2e-16
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
38577.82 47523.18
sample estimates:
mean in group 0 mean in group 1
125007.26 81956.76
## install the report package and use the report function to generate a report of the t-test results
#install.packages("report")
# Generate a report of the t-test results
report_ttest <- report(attrition_ttest_results)
# Print the t-test report
print(report_ttest)Effect sizes were labelled following Cohen's (1988) recommendations.
The Welch Two Sample t-test testing the difference of salary by bi_attrition
(mean in group 0 = 1.25e+05, mean in group 1 = 81956.76) suggests that the
effect is positive, statistically significant, and medium (difference =
43050.50, 95% CI [38577.82, 47523.18], t(5524.24) = 18.87, p < .001; Cohen's d
= 0.51, 95% CI [0.45, 0.56])
# install ggstatsplot package and use ggbetweenstats function to visualize the distribution of monthly income for employees who left and those who stayed
#install.packages("ggstatsplot")
# Load ggstatsplot package
library(ggstatsplot)
# Visualize the distribution of monthly income for employees who left and stayed
ggbetweenstats(
data = hr_perf_dta, # Your dataset
x = bi_attrition, # Categorical variable (Attrition status: 0 = stayed, 1 = left)
y = salary, # Continuous variable (Monthly income/salary)
xlab = "Attrition Status", # Label for x-axis
ylab = "Monthly Income", # Label for y-axis
title = "Comparison of Monthly Income for Employees Who Left vs. Stayed", # Plot title
messages = FALSE, # Turn off messages in the plot
ggtheme = theme_dark(), # Apply a minimal theme for clean look
palette = "Set2", # Use a color palette for clarity
point.args = list(alpha = 0.6, size = 3, color = "orange") # Customize points with transparency and size
) +
theme(
plot.title = element_text(hjust = 0.5, size = 22, face = "bold", color = "black"), # Larger, bold, and centered title
axis.title.x = element_text(size = 16, face = "italic", color = "yellowgreen"), # Styled x-axis title
axis.title.y = element_text(size = 16, face = "italic", color = "yellowgreen"), # Styled y-axis title
axis.text.x = element_text(size = 14, color = "black"), # Larger and colored x-axis labels
axis.text.y = element_text(size = 14, color = "black"), # Larger and colored y-axis labels
legend.position = "top", # Place legend on top for better visibility
legend.text = element_text(size = 12), # Increase legend text size
panel.grid.major = element_line(color = "gray85"), # Use light grid lines for major grid
panel.grid.minor = element_blank() # Remove minor grid lines for a cleaner look
)# create histogram and frequency polygon of salary for employees who left and those who stayed
ggplot(hr_perf_dta, aes(x = salary, fill = as.factor(bi_attrition))) +
geom_histogram(alpha = 0.4, position = "identity", bins = 30, color = "black") + # Histogram
geom_freqpoly(aes(color = as.factor(bi_attrition)), size = 1.5, bins = 30) + # Frequency polygon
facet_wrap(~ as.factor(bi_attrition), scales = "free_y", ncol = 1) + # Facet by attrition
scale_fill_manual(values = c("purple", "violet"), name = "Attrition Status",
labels = c("Stayed", "Left")) + # Custom colors for fill
scale_color_manual(values = c("black", "black"), name = "Attrition Status",
labels = c("Stayed", "Left")) + # Custom colors for frequency polygon
labs(title = "Distribution of Salary by Attrition Status",
subtitle = "Comparing the salary distribution of employees who left vs those who stayed",
x = "Salary", y = "Count",
fill = "Attrition Status", color = "Attrition Status") + # Axis labels and title
theme_dark(base_size = 15) + # Minimal theme for a clean look
theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"), # Center and style title
plot.subtitle = element_text(hjust = 0.5, size = 16), # Center subtitle
axis.text.x = element_text(angle = 0, hjust = 0.5), # Align x-axis text
axis.text.y = element_text(size = 12)) # Adjust y-axis text sizeProvide your discussion here.
Answer:
5.7 Discussion on Analysis of Compensation and Turnover
The t-test was conducted to compare the average monthly income of employees who left the company (bi_attrition = 1) against those who stayed (bi_attrition = 0). The results can be stored in the variable attrition_ttest_results.
T-Test Analysis: The t-test results indicated whether there was a significant difference in average monthly income between employees who left and those who stayed. A significant p-value suggests that compensation may play a role in turnover.
Visualizations: The
ggbetweenstatsplot shows the income distributions for both groups, making it easier to visualize any disparities. The histogram and frequency polygon further illustrate the differences in income levels.
5.8 Recommendations for Addressing Compensation and Turnover
Revise Compensation Policies:
Ensure salaries are competitive by comparing them with industry benchmarks.
Implement bonuses and salary adjustments for high-performing employees to motivate retention.
Regularly assess and adjust employee salaries based on performance and market conditions.
Focus on Retention Strategies:
Provide comprehensive benefits, including health insurance and wellness programs, to improve job satisfaction.
Offer options such as remote work and flexible hours to support work-life balance.
Provide training, mentorship, and professional growth opportunities to foster loyalty.
Monitor Employee Feedback:
Use surveys to assess employee satisfaction and identify areas for improvement.
Create an environment where employees feel comfortable sharing feedback.
Share survey results and outline specific actions taken in response to employee input.
5.9
5.10 Employee satisfaction and performance analysis
Analyze the average performance ratings (both
ManagerRatingandSelfRating) of employees who left vs. those who stayed. Use thegroup_byandcountfunctions to calculate the average performance ratings for each group.Visualize the distribution of
SelfRatingfor employees who left and those who stayed using a bar plot. Use theggplotfunction to create the plot and map theSelfRatingvariable to thexargument and thebi_attritionvariable to thefillargument.Similarly, visualize the distribution of
ManagerRatingfor employees who left and those who stayed using a bar plot. Make sure to map theManagerRatingvariable to thexargument and thebi_attritionvariable to thefillargument.Create a boxplot of
salarybyjob_satisfactionandbi_attritionto analyze the relationship between salary, job satisfaction, and attrition. Use thegeom_boxplotfunction to create the plot and map thesalaryvariable to thexargument, thejob_satisfactionvariable to theyargument, and thebi_attritionvariable to thefillargument. You need to transform thejob_satisfactionandbi_attritionvariables into factors before creating the plot or within theggplotfunction.Discuss the results of the analysis and provide recommendations for HR interventions based on the findings.
# Analyze the average performance ratings (both ManagerRating and SelfRating) of employees who left vs. those who stayed.
# Load the required libraries
library(dplyr)
library(ggplot2)
library(tidyr)
# Step 1: Calculate average ratings by attrition status
average_ratings <- hr_perf_dta %>%
group_by(bi_attrition) %>%
summarise(
Avg_Manager_Rating = mean(manager_rating, na.rm = TRUE),
Avg_Self_Rating = mean(self_rating, na.rm = TRUE),
.groups = 'drop'
)
# Print the average ratings for inspection
print(average_ratings)# A tibble: 2 × 3
bi_attrition Avg_Manager_Rating Avg_Self_Rating
<dbl> <dbl> <dbl>
1 0 3.48 3.98
2 1 3.46 3.99
# Step 2: Create a bar plot for visual comparison
# Melt the data for easier plotting
average_ratings_long <- average_ratings %>%
pivot_longer(cols = c(Avg_Manager_Rating, Avg_Self_Rating),
names_to = "Rating_Type",
values_to = "Average_Rating")
# Plot the average ratings
ggplot(average_ratings_long, aes(x = as.factor(bi_attrition), y = Average_Rating, fill = Rating_Type)) +
geom_bar(stat = "identity", position = "dodge") + # Use dodge position for side-by-side bars
scale_fill_manual(values = c("violet", "purple"),
name = "Rating Type",
labels = c("Manager Rating", "Self Rating")) + # Custom colors
labs(title = "Average Performance Ratings by Attrition Status",
x = "Attrition Status (0 = Stayed, 1 = Left)",
y = "Average Rating") + # Axis labels
theme_minimal(base_size = 15) + # Minimal theme for a clean look
theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold"), # Center and style title
axis.text.x = element_text(size = 12), # Customize x-axis labels
axis.text.y = element_text(size = 12)) # Customize y-axis labels# Visualize the distribution of SelfRating for employees who left and those who stayed using a bar plot.
# Load ggplot2 package if not already loaded
library(ggplot2)
ggplot(hr_perf_dta, aes(x = factor(bi_attrition), fill = factor(self_rating))) +
geom_bar(position = "dodge", color = "black", size = 0.7) + # Adding black borders to bars for clarity
scale_fill_manual(values = c("violet", "lavender", "pink", "#e78ac3", "red")) + # Custom color palette
labs(
title = "Self-Rating Distribution: Employees Who Left vs. Stayed",
x = "Attrition Status (0 = Stayed, 1 = Left)",
y = "Number of Employees",
fill = "Self-Rating"
) +
theme_dark(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, size = 20, face = "bold", color = "#2c3e50"), # Center and style the title
axis.title.x = element_text(size = 14, color = "#34495e"), # Custom axis title color
axis.title.y = element_text(size = 14, color = "grey"),
axis.text.x = element_text(size = 12, color = "grey"),
axis.text.y = element_text(size = 12, color = "grey"),
legend.position = "top", # Move legend to the top for better readability
legend.title = element_text(size = 13, face = "bold"),
legend.text = element_text(size = 11),
panel.grid.major = element_line(color = "grey90"), # Lighten grid lines
panel.grid.minor = element_blank() # Remove minor grid lines for simplicity
)# Visualize the distribution of ManagerRating for employees who left and those who stayed using a bar plot.
# Load necessary library
library(ggplot2)
# Create the bar plot for ManagerRating distribution
ggplot(hr_perf_dta, aes(x = factor(bi_attrition), fill = factor(manager_rating))) +
geom_bar(position = "dodge", color = "black", size = 0.7) + # Dodge bars for separate ManagerRatings
scale_fill_brewer(palette = "Set2") + # Use a nice color palette for the fill
labs(title = "Distribution of ManagerRating for Employees Who Left vs. Stayed",
x = "Attrition Status (0 = Stayed, 1 = Left)",
y = "Count",
fill = "Manager Rating") + # Labels for axes and legend
theme_dark(base_size = 15) + # Use minimal theme for clean appearance
theme(
plot.title = element_text(hjust = 0.5, size = 20, face = "bold", color = "black"), # Centered title
axis.title.x = element_text(size = 16, face = "italic", color = "yellowgreen"), # Styled x-axis label
axis.title.y = element_text(size = 16, face = "italic", color = "yellowgreen"), # Styled y-axis label
axis.text.x = element_text(size = 14, color = "gray"), # Larger x-axis labels
axis.text.y = element_text(size = 14, color = "gray"), # Larger y-axis labels
legend.position = "top", # Legend at the top
legend.text = element_text(size = 12), # Bigger legend text
panel.grid.major = element_line(color = "gray85"), # Light grid lines
panel.grid.minor = element_blank() # No minor grid lines
)# create a boxplot of salary by job_satisfaction and bi_attrition to analyze the relationship between salary, job satisfaction, and attrition.
# Load necessary library
library(ggplot2)
# Create a boxplot for Salary by Job Satisfaction and Attrition Status
ggplot(hr_perf_dta, aes(x = factor(job_satisfaction), y = salary, fill = factor(bi_attrition))) +
geom_boxplot(outlier.color = "red", outlier.size = 2, alpha = 0.7) + # Add boxplot with outliers
scale_fill_manual(values = c("#00A", "#E7B"), labels = c("Stayed", "Left")) + # Custom fill colors and labels
labs(title = "Salary Distribution by Job Satisfaction and Attrition Status",
x = "Job Satisfaction Level",
y = "Salary",
fill = "Attrition Status") + # Add axis labels and title
theme_dark(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, size = 18, face = "bold", color = "navyblue"), # Centered title
axis.title.x = element_text(size = 16, face = "italic", color = "navyblue"), # Styled x-axis label
axis.title.y = element_text(size = 16, face = "italic", color = "green"), # Styled y-axis label
axis.text.x = element_text(size = 14, color = "darkblue"), # Larger x-axis labels
axis.text.y = element_text(size = 14, color = "darkblue"), # Larger y-axis labels
legend.position = "top", # Move legend to the top
panel.grid.major = element_line(color = "gray85"), # Light major grid lines
panel.grid.minor = element_blank() # Remove minor grid lines
)Provide your discussion here.
Answer:
The calculation of average performance ratings (both manager_rating and self_rating) revealed differences in how employees who left compared to those who stayed rated their performance.
Findings:
Employees who left (bi_attrition = 1) generally had lower average ratings in both ManagerRating and SelfRating compared to those who stayed (bi_attrition = 0).
This disparity suggests that employees who feel less valued or acknowledged in their performance assessments are more likely to leave the organization.
5.10.1 2. Distribution of SelfRating and ManagerRating
The bar plots visualizing the distribution of SelfRating and ManagerRating further emphasized the performance perception among employees based on their attrition status.
self_rating:
- The bar plot indicated a notable difference in the frequency of ratings among employees who left versus those who stayed. A higher concentration of lower self-ratings was observed in the attrition group.
manager_rating:
- Similarly, the ManagerRating distribution showed that employees who left had a tendency to receive lower ratings from their managers, indicating possible shortcomings in support or recognition from management.
5.10.2 3. Salary by Job Satisfaction and Attrition
The boxplot illustrating the relationship between salary, job satisfaction, and attrition provided additional context.
Observations:
Higher salary levels were generally associated with higher job satisfaction ratings.
Employees who left tended to report lower job satisfaction and lower salary levels, suggesting that inadequate compensation and dissatisfaction with job conditions may have contributed to their decision to leave.
5.11 Recommendations
Improve Performance Management Practices:
Implement continuous performance feedback systems that provide employees with timely insights into their performance and growth areas. This can help employees feel valued and more engaged.
Train managers to provide constructive feedback and recognize employee contributions effectively. This will help improve ManagerRating scores and foster a more supportive work environment.
Develop programs to enhance job satisfaction, such as team-building activities, wellness programs, and opportunities for career advancement. Engaging employees can improve their perceptions of job satisfaction and reduce turnover.
Re-evaluate compensation packages to ensure they are competitive and aligned with industry standards. Consider implementing performance bonuses to reward high-performing employees effectively.
Implement exit interviews to gather insights from departing employees about their experiences. Understanding their reasons for leaving can provide valuable information for improving retention strategies.
Conduct regular employee satisfaction surveys to assess job satisfaction, performance perceptions, and overall engagement. Use this data to make informed decisions regarding policies and practices.
5.12 Conclusion
The analysis underscores the importance of employee satisfaction and performance ratings in retention strategies. By addressing the gaps in performance perceptions, enhancing job satisfaction, and revising compensation policies, HR can create a more positive work environment, ultimately reducing turnover and fostering a more engaged workforce.
5.13 Work-life balance and retention strategies
At this point, you are already well aware of the dataset and the possible factors that contribute to employee attrition. Using your R skills, accomplish the following tasks:
Analyze the distribution of WorkLifeBalance ratings for employees who left versus those who stayed.
Use visualizations to show the differences.
Assess whether employees with poor work-life balance are more likely to leave.
You have the freedom how you will accomplish this task. Be creative and provide insights that will help HR develop effective retention strategies.
#step 1:Analyze the Distribution of WorkLifeBalance Ratings for Employees Who Left vs. Stayed
# Bar plot showing the distribution of WorkLifeBalance for employees who stayed vs. left
ggplot(hr_perf_dta, aes(x = factor(work_life_balance), fill = factor(bi_attrition))) +
geom_bar(position = "dodge", color = "black", size = 0.7) +
scale_fill_manual(values = c("#774936", "#babd8d")) + # Custom colors for staying vs leaving
labs(title = "Distribution of WorkLifeBalance for Employees Who Left vs. Stayed",
x = "WorkLifeBalance Rating",
y = "Number of Employees",
fill = "Attrition Status\n(0 = Stayed, 1 = Left)") +
theme_minimal(base_size = 15) +
theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
axis.text.x = element_text(size = 12),
axis.text.y = element_text(size = 12),
legend.position = "right")#step 2: Assess Whether Employees with Poor Work-Life Balance Are More Likely to Leave
# Create a contingency table for WorkLifeBalance vsbi_Attrition
worklife_attrition_table <- table(hr_perf_dta$work_life_balance, hr_perf_dta$bi_attrition)
# Display the contingency table
worklife_attrition_table
0 1
1 84 37
2 1134 568
3 1090 580
4 1146 560
5 994 516
# Perform Chi-Square Test to assess the relationship
chi_sq_test <- chisq.test(worklife_attrition_table)
# Display the test result
chi_sq_test
Pearson's Chi-squared test
data: worklife_attrition_table
X-squared = 2.138, df = 4, p-value = 0.7104
#step 3: Visualize WorkLifeBalance and Attrition with Boxplot
# Enhanced Boxplot to compare WorkLifeBalance ratings for employees who stayed vs left
ggplot(hr_perf_dta, aes(x = factor(bi_attrition), y = work_life_balance, fill = factor(bi_attrition))) +
geom_boxplot(color = "black", size = 0.8, width = 0.6, outlier.color = "blue", outlier.shape = 16, outlier.size = 2) +
scale_fill_manual(values = c("#f7e1d7", "#edafb8")) + # Custom colors for staying (blue) vs leaving (red)
labs(title = "Work-Life Balance Comparison: Employees Who Stayed vs. Left",
x = "Attrition Status (0 = Stayed, 1 = Left)",
y = "Work-Life Balance Rating") +
theme_minimal(base_size = 15) +
theme(
plot.title = element_text(hjust = 0.5, size = 20, face = "bold", color = "black"), # Centered bold title
axis.title.x = element_text(size = 15, face = "bold"),
axis.title.y = element_text(size = 15, face = "bold"),
axis.text.x = element_text(size = 12, color = "black", face = "bold"), # Custom x-axis labels
axis.text.y = element_text(size = 12, color = "black"), # Custom y-axis labels
panel.grid.major = element_line(color = "gray90"), # Light major grid lines
panel.grid.minor = element_blank(), # Remove minor grid lines
panel.border = element_blank(), # Remove plot border
legend.position = "none" # Hide legend for simplicity
)5.14 Insights from the Visualizations
Distribution of Work-Life Balance Ratings:
For Employees Who Stayed (Attrition = 0): The boxplot likely shows a wide spread of work_life_balance ratings, with many employees reporting moderate to high satisfaction (higher WLB ratings).
For Employees Who Left (Attrition = 1): The work_life_balance ratings for this group appear to be generally lower, indicating that employees with poor to moderate WLB were more likely to leave the organization.
The outliers in the plot (employees with very low or high ratings) are a small portion of the dataset, but those with poor WLB ratings (e.g., 1 or 2) are notably more frequent among those who left.
On average, employees who left the company had lower WLB ratings than those who stayed, indicating a correlation between poor work-life balance and higher attrition rates.
The boxplot’s position and the range of values suggest that employees with lower WLB ratings are at a higher risk of leaving, with a tendency towards dissatisfaction in work-life balance among employees who departed.
Employees with extreme dissatisfaction (very low WLB ratings) form a key group that contributes to attrition. These outliers can indicate severe dissatisfaction that could be targeted for intervention.
Based on the analysis conducted, provide recommendations for HR interventions that could help reduce employee attrition and improve overall employee satisfaction and performance. You may use the following question as guide for your recommendations and discussions.
What are the key factors contributing to employee attrition in the company?
Which factors are most strongly correlated with attrition?
What strategies could be implemented to improve employee retention and satisfaction?
How can HR leverage the insights from the analysis to develop effective retention strategies?
What are the potential benefits of implementing these strategies for the company?
answer:
Key Factors Contributing to Attrition:
- Poor work-life balance, low job satisfaction, inadequate compensation, limited career development opportunities, and high workload are driving attrition.
Strongly Correlated Factors:
- Low work-life balance ratings, dissatisfaction with compensation, and weak manager-employee relationships are closely linked to employee turnover.
Retention Strategies:
Implement flexible work policies (e.g., remote work, flexible hours).
Regularly review and adjust compensation and benefits.
Provide career development opportunities and mentorship.
Train managers to improve relationships with employees.
Monitor workload to prevent burnout.
Leveraging Insights:
Use work-life balance ratings to identify at-risk employees and offer targeted interventions.
Conduct regular employee satisfaction surveys to monitor engagement.
Continuously assess and adjust strategies for retention effectiveness.
Potential Benefits:
- Reduced attrition, improved employee satisfaction and engagement, enhanced company reputation, and increased overall performance.